Bounded Independence Fools Degree-2 Threshold Functions 

Ilias Diakonikolas^ Daniel M. Kane^ Jelani Nelson^ 

iliasOcs.columbia.edu dankaneOmath.harvard.edu minilek@mit.edu 



Abstract 

Let a; be a random vector coming from any fc-wise independent distribution over {—1, 1}". For 
an n-variate degree-2 polynomial p, we prove that E[sgn(p(a;))] is determined up to an additive 
£ for k = poly(l/e). This answers an open question of Diakonikolas et al. (FOGS 2009). Using 
standard constructions of fc-wise independent distributions, we obtain a broad class of explicit 
generators that e-fool the class of degree-2 threshold functions with seed length log7i-poly(l/e). 

Our approach is quite robust: it easily extends to yield that the intersection of any constant 
number of degree-2 threshold functions is e-fooled by poly(l/£)-wise independence. Our results 
also hold if the entries of x are fc-wise independent standard normals, implying for example that 
bounded independence derandomizes the Goemans- Williamson hyperplane rounding scheme. 

To achieve our results, we introduce a technique we dub multivariate FT-mollification, a 
generalization of the univariate form introduced by Kane et al. (SODA 2010) in the context 
of streaming algorithms. Along the way we prove a generalized hypercontractive inequality for 
quadratic forms which takes the operator norm of the associated matrix into account. These 
techniques may be of independent interest. 
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1 Introduction 

This paper is concerned with the power of Hmited independence to fool low-degree polynomial 
threshold functions. A degree-d polynomial threshold function (henceforth PTF), is a boolean 
function / : {—1, 1}"" — )• {—1, 1} expressible as f{x) = sgn{p{x)), where p is an n-variate degree-d 
polynomial with real coefficients, and sgn is —1 for negative arguments and 1 otherwise. PTFs 
have played an important role in computer science since the early perceptron work of Minsky and 
Papert [31], and have since been extensively investigated in circuit complexity and communication 
complexity [lElliailllliaillllSHllMlESlETlEH], learning theory [Ml [271 EH] , and more. 
A distribution T> on {—1, 1}" is said to e-fool a function / : { — 1, 1}" — )• {—1, 1} if 

\B^^v[f{x)]-E^^u[f{x)]\<e 

whereU is the uniform distribution on { — 1, 1}". A distribution D on {—1, 1}" is A;-wise independent 
if every restriction of 2? to coordinates is uniform on {—1,1}'^. Despite their simplicity, A:- wise 
independent distributions have been a surprisingly powerful and versatile derandomization tool, 
fooling complex functions such as AC'' circuits [H[36l[9] and half-spaces [H]. As a result, this class 
of distributions has played a fundamental role in many areas of theoretical computer science. 

Our Results. The problem we study is the following: How large must k = k{n, d, e) be in order for 
every k-wise independent distribution on { — 1, 1}" to e-fool the class of degree-d PTF's? The d = 1 
case of this problem was recently considered in [14], where it was shown that k{n, l,e) = 0(l/e^), 
independent of n, with an alternative proof to much of the argument given in [25]. The main open 
problem in [14] was to identify k = k{n,d,e) for d > 2. In this work, we make progress on this 
question by proving the following: 

Theorem 1.1. Any J7(e~^)-wise independent distribution on { — 1, 1}" e-fools all degree-2 PTFs. 

Prior to this work, no nontrivial result was known for d > 1; it was not even known whether 
o(?i)-wise independence suffices for constant e. Using known constructions of fc-wise independent 
distributions [H [13], Theorem 11.11 gives a large class of pseudo-random generators (PRCs) for 
degree-2 PTFs with seed length log(n) • 0{e~^). 

Our techniques are quite robust. Our approach yields for example that Theorem 1 1 . 1 1 holds not 
only over the hypercube, but also over the n-variate Gaussian distribution. This already implies that 
the Goemans- Williamson hyperplane rounding scheme [18] (henceforth "GW rounding") can be 
derandomized using poly(l/e)-wise independenc^. Our technique also readily extends to show that 
the intersection of m halfspaces, or even m degree-2 threshold functions, is e-fooled by poly(l/e)- 
wise independence for any constant m (over both the hypercube and the multivariate Gaussian). 
One consequence of this is that 0(l/e^) -wise independence suffices for GW rounding. 

Another consequence of Theorem 11.11 is that bounded independence suffices for the invariance 
principle of Mossell, O'Donnell, and Oleszkiewicz in the case degree-2 polynomials. Let p{x) be an 
n-variate degree-2 multi-linear polynomial with "low influences" . The invariance principle roughly 
says that the distribution of p is essentially invariant if x is drawn from the uniform distribution 
on {—1, 1}" versus the standard n-dimensional Gaussian distribution A/'(0, 1)". Our result implies 
that the x's do not need to be fully independent for the invariance principle to apply, but that 
bounded independence suffices. 

^We note that other derandomizations of GW rounding are known with better dependence on e, though not solely 
using fc-wise independence; see [291 140] , 
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Motivation and Related Work. The literature is rich with explicit generators for various 
natural classes of functions. Recently, there has been much interest in not only constructing PRGs 
for natural complexity classes, but also in doing so with as broad and natural a family of PRGs 
as possible. One example is the recent work of Bazzi [4] on fooling depth-2 circuits (simplified by 
Razborov [36]), and of Braverman [9] on fooling AC*^, with bounded independenc^. 

Simultaneously and independently from our work, Meka and Zuckerman [30] constructed PRGs 
against degree-d PTFs with seed length logn • 2^^'^'> ■ [30]. That is, their seed length for 

d = 2 is similar to ours (though worse by a poly(l/e) factor). However, their result is incomparable 
to ours since their pseudorandom generator is customized for PTFs, and not based on /c-wise 
independence alone. We believe that the ideas in our proof may lead to generators with better 
seed-lengtlil, and that some of the techniques we introduce are of independent interest. 

In other recent and independent works, [20^ [23] give PRGs for intersections of m halfspaces 
(though not degree- 2 threshold functions). The former has polynomial dependence on m and 
requires only bounded independence as well (and considers other functions of halfspaces beside 
intersections), while the latter has poly-logarithmic dependence on m under the Gaussian measure 
but is not solely via bounded independence. Our dependence on m is polynomial. 

2 Notation 

Let p : { — 1, 1}" — )■ M be a polynomial and p{x) = J2sc[n]PsXs be its Fourier- Walsh expansion, 

where xs{x) '= Ylies^i- "^^^ influence of variable z on p is Infi(p) =^ JlsBiPs^ ^^'^ total 
influence of p is Inf(p) = "^^^i Infj(p). If Infj(p) < r • Inf(p) for all i, we say that the polynomial 
p is T-regular. If f[x) = sgn(p(x)), where p is r-regular, we say that / is a r-regular PTF. 

For i? C R"' denote by : M'^ — t- {0, 1} its characteristic function. It will be convenient in some 
of the proofs to phrase our results in terms of e-fooling E[/[o^oo)(p(^))] ^ opposed to E[sgn(p(x))]. 
It is straightforward that these are equivalent up to changing e by a factor of 2. 

We frequently use A B to denote that |A — = 0(e), and we let the function d2{x,R) 
denote the L2 distance from some x G M'^ to a region R C W^. 

3 Overview of our proof of Theorem 11.11 

The program of our proof follows the outline of the proof in [14] . We first prove that bounded 
independence fools the class of regular degree-2 RTF's. We then reduce the general case to the 
regular case to show that bounded independence fools all degree-2 PTF's. The bulk of our proof 
is to establish the first step; this is the most challenging part of this work and where our main 
technical contribution lies. The second step is achieved by adapting the recent results of |15j . 

We now elaborate on the first step. Let / : {—1,1}" — )• { — 1,1} be a boolean function. To 
show that / is fooled by /c-wise independence, it suffices - and is in fact necessary - to prove the 
existence of two degree-Zc "sandwiching" polynomials qu,qi ■ {—1, 1}" — ^ {~1) 1} that approximate 
/ in a certain technical sense (see e.g. [4] [7]). Even though this is an n-dimensional approximation 
problem, it may be possible to exploit the additional structure of the function under consideration 
to reduce it to a low-dimensional problem. This is exactly what is done in both [1^ and [25] for 
the case of regular halfspaces. 

^Note that a PRG for AC" with quahtatively similar - in fact slightly better - seed length had being already given 
by Nisan [55] , 

■^An easy probabilistic argument shows that there exists PRGs for degree-d PTFs with seed-length 0(dlog(n/e)). 
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We now briefly explain the approaches of [13] and [25j . Let /(x) = sgn{{'w,x)) be an e^-regular 
halfspace, i.e. 111(^112 = 1 and maxj|u;j| < e. An insight used in jl4j (and reused in [25]) is the 
fohowing: the random variable {w, x) behaves approximately like a standard Gaussian, hence it 
can be treated as if it was one-dimensional. Thus, both [14] and [25] construct a (different in 
each case) univariate polynomial P : R — )■ M that is a "good" approximation to the sign function 
under the normal distribution in R (in the case of [25], the main point of the alternative proof was 
to avoid explicitly reasoning about any such polynomials, but the existence of such a polynomial 
is still implicit in the proof). The desired n-variate sandwiching polynomials are then obtained 
(roughly) by setting qu{x) = P{{w,x)) and qu{x) = —P{—{w,x)). It turns out that this approach 
suffices for the case of halfspaces. In jl4j the polynomial P is constructed using approximation 
theory arguments. In [25] it is obtained by taking a truncated Taylor expansion of a certain 
smooth approximation to the sign function, constructed via a method dubbed "Fourier Transform 
mollification" (henceforth FT-mollification) . We elaborate in Section [3. II below. 

Let f{x) = sgTi{p{x)) be a regular degree-2 PTF. A first natural attempt to handle this case 
would be to use the univariate polynomial P described above - potentially allowing its degree to 
increase - and then take qu{x) = P{p{x)), as before. Unfortunately, such an approach fails for both 
constructions outlined above. We elaborate on this issue in Section [O 

3.1 FT-mollification FT-mollification is a general procedure to obtain a smooth function with 
bounded derivatives that approximates some bounded function /. The univariate version of the 
method in the context of derandomization was introduced in [25]. In this paper we generalize it to 
the multivariate setting and later use it to prove our main theorem. 

For the univariate case, where / : M — )• M, [25] defined f^{x) = {c-b{c-t)* f {t)){x) for a parameter 
c, where b has unit integral and is the Fourier transform of a smooth function b of compact support 
(a so-called bump function). Here "*" denotes convolution. The idea of smoothing functions 
via convolution with a smooth approximation of the Dirac delta function is old, dating back to 
"Friedrichs mollifiers" |17j in 1944. Indeed, the only difference between Friedrichs mollification and 
FT-mollification is that in the former, one convolves / with the scaled bump function, and not 
its Fourier transform. The switch to the Fourier transform is made to have better control on the 
high-order derivatives of the resulting smooth function, which is crucial for our application. 

In our context, the method can be illustrated as follows. Let X = '^■aiXi for independent 
Xi. Suppose we would like to argue that E[/(X)] ss^ E[/(y)], where Y = J^i'^i^i fc-wise 
independent YJ's that are individually distributed as the Xi. Let be the FT-mollified version of 
/. If the parameter c = c(e) is appropriately selected, we can guarantee that |/(x) — f^{x)\ < e 
"almost everywhere", and furthermore have "good" upper bounds on the high-order derivatives 
of f^. We could then hope to show the following chain of inequalities: E[/(X)] E[/'^(X)] 
E[/'^(y)] ~e E[/(y)]. To justify the first inequality, note / and are close almost everywhere, and 
so it suffices to argue that X is sufficiently anti-concentrated in the small region where they are not 
close. The second inequality would use Taylor's theorem, bounding the error via upper bounds on 
moment expectations of X and the high-order derivatives of f^. Showing the final inequality would 
be similar to the first, except that one needs to justify that even under fc-wise independence the 
distribution of Y is sufficiently anti-concentrated. We note that the argument outlined above was 
used in [25] to provide an alternative proof that bounded independence fools regular halfspaces, 
and to optimally derandomize Indyk's moment estimation algorithm in data streams [23] . However, 
this univariate approach fails for degree-2 PTFs for technical reasons (see Section IC]) . 
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We now describe our switch to multivariate FT-mollification. Let / : {—1,1}" — )• {—1,1} be 
arbitrary and let 5 C M" with /"^l) C 5 C M"\/-i(-l). Then fooling E[/(x)] and fooling 
E[75(a;)] are equivalent. A natural attempt to this end would be to generalize FT-mollification to n 
dimensions, then FT-mollify Is and argue as above using the multivariate Taylor's theorem. Such 
an approach is perfectly valid, but as one might expect, there is a penalty for working over high 
dimensions. Both our quantitative bounds on the error introduced by FT-mollifying, and the error 
coming from the multivariate Taylor's theorem, increase with the dimension. Our approach is then 
to find a low- dimensional representation of such a region S which allows us to obtain the desired 
bounds. We elaborate below on how this can be accomplished in our setting. 

3.2 Our Approach Let / = sgn(p) be a regular multilinear degree-2 PTF with ||p||2 = 1 
(wlog). Let us assume for simplicity that p is a quadratic form; handling the additive linear form 
and constant is easy. The first conceptual step in our proof is this: we decompose p as pi —p2 +P3, 
where pi,P2 are positive semidefinite quadratic forms with no small non-zero eigenvalues and p^ 
is indefinite with all eigenvalues small in magnitude. This decomposition, whose existence follows 
from elementary linear algebra, is particularly convenient for the following reason: for pi,P2, we are 
able to exploit their positive semidefiniteness to obtain better bounds from Taylor's theorem, and 
for p3 we can establish moment bounds that are strictly stronger than the ones that follow from 
hypercontractivity for general quadratic forms (our Theorem 15. H which may be of independent 
interest). The fact that we need pi,P2 to not only be positive semidefinite, but to also have no 
small eigenvalues, arises for technical reasons; specifically, quadratic forms with no small non-zero 
eigenvalues satisfy much better tail bounds, which plays a role in our analysis. 

We now proceed to describe the second conceptual step of the proof, which involves multivariate 
FT-mollification. As suggested by the aforementioned, we would like to identify a region i? C 
such that I[o^od){p{x)) can be written as /r(F(x)) for some F : { — 1, 1}" — )■ that depends on the 
Pi, then FT-mollify Ir. The region R is selected as follows: note we can write P3{x) = x'^Ap^x, 
where Ap^ is a real symmetric matrix with trace T. We consider the region R = {x : x^ — X2 + 
X3 -I- T > 0} C R^. Observe that I[o^co){p{x)) = Ir{-\/pi{x), ^/ P2{x) , Psix) - T). (Recall that pi,P2 
are positive-semidefinite, hence the first two coordinates are always real.) We then prove via FT- 
mollification that E[Iji{y/pi{x), \/p2ix) , Psix) — T)] is preserved within e by bounded independence. 

The high-level argument is of similar flavor as the one outlined above for the case of halfspaces, 
but the details are more elaborate. The proof makes essential use of good tail bounds for pi,p2, a 
new moment bound for p^, properties of FT-mollification, and a variety of other tools such as the 
Invariance Principle |32j and the anti-concentration bounds of }12] . 

Organization. Section [J] contains the results we will need on multivariate FT-mollification. In 
Section [5] we give our improved moment bound on quadratic forms. Section [6] contains the analysis 
of the regular case, and Section [7] concludes the proof of our main theorem. Section [8] summarizes 
our results on intersections. 

4 Multivariate FT-mollification 

Definition 4.1. In hyperspherical coordinates in M*^, we represent a point x = (xi, . . . ,Xd) by 
Xi = r cos{(j)i)YY'~}^sm{(pj) for i < d, and x^ = r fY^Zl sm{(j)j) . Here r = ||x||2 and the satisfy 
< (/>i < vr for i < d - 1, and < (pd-i < 27r. 

Fact 4.2. Let J be the Jacobian matrix corresponding to the change of variables from Cartesian 



4 



to hyper spherical coordinates. Then 

d-2 



d-l-ii 

sin 

j=l 



det( J) = r'^-^ JJ sin' 

We define the bump function 6 : M'^ — )■ M by 

^ A - ll^lli for ||2;||2 < 1 

1 otherwise 

The value Cd is chosen so that ||6||2 = 1. We note that h is not smooth (its mixed partials do not 
exist at the boundary of the unit sphere), but we will only ever need that ^6 G L'^{W^) for ah 
i G [d]. 

Henceforth, we make the setting 

Ad = Cd- / nsin'^-i-*(</.i) #id</.2---#d-i. 

We let 6 : — 7- M denote the Fourier transform of 6, i.e. 

h{t) = — L- / 6(x)e-<-'*)rfx. 



Finally, i? : M'^ — )• M denotes the function and we define i?c : IR'' by 

Bc{xi, . . . ,Xd) = c''- ■ B{cxi, . . . ,cxd). 

Definition 4.3 (Multivariate FT-mollification) . For F : M'^ — )• M and given c > 0, we define the 
FT-moUification F" :R'^ ^Rhy 



F\x) = {B, * F){x) = [ B,{y)F{x - y)dy. 



In this section we give several quantitative properties of FT-mollification. We start off with a 
few lemmas that will be useful later. 



Lemma 4.4. For any c > 0, 

/ Bc{x)dx = 1. 



Proof. Since B = the stated integral when c = 1 is which is = 1 by Plancherel's 

theorem. For general c, make the change of variables u = (cxi, . . . , cxd) then integrate over u. ■ 

Before presenting the next lemma, we familiarize the reader with some multi-index notation. A 
(i-dimensional multi-index is a vector /3 G N"^ (here N is the nonnegative integers). For a,/3 G N*^, 
we say a < /3 if the inequality holds coordinate-wise, and for such a,/3 we define |/3| = 
(^) = (2), and /3! = Hjli A!- For x G M"' we use x^ to denote Jljli 2;f , and for / : R"^ ^ M 
we use (9^/ to denote -^^^fT^^^f- 
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Lemma 4.5. For any /3 G N'^, ||<9^S||i < 21'^!. 



Proof. We have 



Thus, 



= E (!) (''°' 



a</3 ^ ^ 



< 



E 

Q</3 

E 

a</3 



/3 



a</3 

= 2l^l 



x^~" • b 



(4.1) 

(4.2) 

(4.3) 
(4.4) 



Eq. (14. ip follows by Cauchy-Schwarz. Eq. (14.20 follows from Plancherel's theorem, since the Fourier 
transform of d'^b is x" • b, up to factors of i. Eq. (|4.3p follows since • b\\2 < ||6||2 = 1- Eq. ()4.4p 
is seen combinatorially. Suppose we have 2d buckets Aj for (i, j) G [ti] x [2]. We also have balls, 
with each having one of d types with /3j balls of type i. Then the number of ways to place balls into 
buckets such that balls of type i only go into some is 21'^' (each ball has 2 choices). However, it 
is also X^Q<^ (^) , since for every placement of balls we must place some number Oi balls of type i 
in Aj and /3i — Oj balls in ■ 



Lemma 4.6. Let z > be arbitrary. Then 



x\\2>dz 



B{x)dx = 0(l/z^ 



Proof. Consider the integral 



S= f \\x\\l-B{x)dx = y2( f xj-B 



{x)dx . 



Recalling that B = b^, the Fourier transform of B is {2tt)~^^'^ (b * b). The above integral is (27r)'^/^ 
times the Fourier transform of xj-B, evaluated at 0. Since multiplying a function by i-Xj corresponds 
to partial differentiation by Xj in the Fourier domain. 



i=l ^"^i ^ i=l 



_d_ 

dxi 



b * 



dxi 



)(o) = E 

1=1 



_d_ 

dxi 



with the last equality using that -^b is odd. 
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We have, for x in the unit bah, 



b]{x) 



_d_ 

dxi 



so that, after switching to hyperspherical coordinates. 



A, 



Jo 



Claim 4.7. 



Proof. By definition of b, 



dxi 



0{cP) 



—I 

dxi 

d 

E 

Ad- 

d{d + 2){d + A) 



Ar'^+^dr. 



(4.5) 



d+3 _ 2r'^+^dr 



We also have by Eq. ()4.5I) that 



E 

i=l 



The claim follows since ||6||2 = 1. 



dxi 



= Ad 

2 d + 2 



Ad ■ 0{l/d). 



We now finish the proof of the lemma. Since B has unit integral on (Lemma 14. 4p and is 
nonnegative everywhere, we can view B as the density function of a probability distribution on M'^. 
Then S can be viewed as E2,^b[||x||2]. Then by Markov's inequality, for x ~ i?. 



Pr[\\x\\l>z'-B[\\x\\l]] <\lz\ 



which is equivalent to 



Pr 



E[||x||i] 



We conclude by observing that the above probability is simply 

B{x)dx^ 



from which the lemma follows since E[||a;||2] = 0{d^) by Claim IT71 ■ 

We now state the main theorem of this section, which says that if F is bounded, then F'^ is 
smooth with strong bounds on its mixed partial derivatives, and is close to F on points where F 
satisfies some continuity property. 

Theorem 4.8. Let F : M'^ — ;> M be bounded and c > be arbitrary. Then, 
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i- ||9^^1|oo < ll-P'lloo • (2c)l^' for all p G N-^. 



ii. Fix some x G W^. Then if \F{x) — F{y)\ < e whenever ||x — y\\2 < 6 for some e,6 > 0, then 
\F%x) - F{x)\ < e + ||F||oo • 0{(f/{cH^)). 



Proof. We first prove (i). 



a^F^) {x) 







(x) 




^*F 


){x) 


[ (d'BM 


y)F{ 


F\oo ' 


d^B, 


1 


F\cio ' c 


l/3| . 


d^B 


Filoo -(20)1^1 





(4.6) 



with the last inequality holding by Lemma 14.51 
We now prove (ii). 

F'^ix) = {B,*F)ix) 

Bc{x - y)F{y)dy 

+ / {F{y)-F{x))B,{x-y)dy 

jRd 

(Fiy) - F{x))B,{x -y)+ [ 
\Bcix - 



F{x 

F{x 

F{x 

F{x 

F{x 

F{x 
F{x 



(4.7) 



^—y\\2<s 



+ 

±e 

±e 

±e±\\F\\oo 

±e±\\F\\oo 
±e± llFlloo 



(Fiy) - F{x))B,{x - y) 



+ 



x—y\\2>& 

(Fiy) - F{x))B,{x - y) 



■ I B,{x -y)+ [ (Fiy) - F{x))B,{x - y) 

jRd -'||a;"j/||2><5 



Bc{x - y)dy 

x-y\\2>S 

B{u)du 
0{d'/{c^5'^)) 



u\\2>cS 
'2 //„2 r2> 



where Eq. (j4.7p uses Lemma |4.4[ ■ 

Remark 4.9. It is possible to obtain sharper bounds on ||5^F'^||oo- In particular, note in the 
proof of Theorem ESI that \\d^ F^'W^ < \\F\\^ ■ c'^l • \\d^ B\\i. An improved bound on \\d^ B\\ 1 versus 
that of Lemma 14.51 turns out to be possible. This improvement is useful when FT-mollifying over 
high dimension, but in the proof of our main result (Theorem II. ip we are never concerned with 
d > A. We thus above presented a simpler proof for clarity of exposition, and we defer the details 
of the improvement to Section fG.il 
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The following theorem is immediate from Theorem l4.8l and gives guarantees when FT-mollifying 
the indicator function of some region. In Theorem 14.101 some later proofs which invoke the 
theorem, we use the following notation. For R C W^, we let OR denote the boundary of R (specifi- 
cally in this context, dR is the set of points x G M*^ such that for every e > 0, the ball about x of 
radius e intersects both R and M.'^\R). 

Theorem 4.10. For any region i? C M'^ and x G W^, 

|/«W-/M.)l<mi„{l,o((^^^)')}. 

Proof. We have \Ir{x) — I'[iix)\ < 1 always. This follows since I'^ is nonnegative (it is the 
convolution of nonnegative functions), and is never larger than ||/i?||oo = 1- The other bound is 
obtained, for x ^ dR, by applying Theorem 14.81 to F = Ir with e = 0, 6 = d2{x, OR). ■ 

5 A spectral moment bound for quadratic forms 

For a quadratic form p{x) = ^j<j a-ijXiXj, we can associate a real symmetric matrix Ap which has 
the tti^i on the diagonals and amin{i,j},max{i,j}/2 on the offdiagonals, so that p{x) = x'^ApX. We 
now show a moment bound for quadratic forms which takes into account the maximum eigenvalue 
of Ap. Our proof is partly inspired by a proof of Whittle j42j . who showed the hypercontractive 
inequality for degree-2 polynomials when comparing g-norms to 2-norms (see Theorem IB.ip . 

Recah the Frobenius norm of ^ G M"""" is P||2 = y^E^jli ^Ij = \/Yli = Vtrp^J, where 
tr denotes trace and A has eigenvalues Ai,...,An. Also, let ||^||oo be the largest magnitude of 
an eigenvalue of A. We can now state and prove the main theorem of this section, which plays a 
crucial role in our analysis of the regular case of our main theorem (Theorem II. ip . 

Theorem 5.1. Let A G M"^" be symmetric and x G {—1, 1}" be random. Then for all k > 2, 

E[|(x^^x) -tr(^)|'=] < ■max{Vk\\A\\2,k\\A\\^}'' 

where C is an absolute constant. 

Note if X]j<j afj < 1 then ||^p||oo < li in which case our bound recovers a similar moment bound 
as the one obtained via hypercontractivity. Thus, in the special case of bounding kth moments of 
degree-2 polynomials against their 2nd moment, our bound can be viewed as a generalization of 
the hypercontractive inequality (and of Whittle's inequality). 

We first give two lemmas. The first is implied by Khintchine's inequality [21], and the second 
is a discrete analog of one of Whittle's lemmas. 

Lemma 5.2. For a G M", x as above, and A; > 2 an even integer, E[(a"^x)'^] < ||a||2 • k'^^'^. 

Lemma 5.3. If X,Y are independent with E[y] = and if /c > 2, then E[|X|''] < E[\X - Y\'']. 

Proof. Consider the function f{y) = \X — y\^. Since f^'^\ the second derivative of /, is nonnegative 
on R, the claim follows by Taylor's theorem since \X — Y\^ > \X\^ — A;y(sgn(X) • X)''~^. ■ 

We are now prepared to prove our Theorem 15.11 
Proof (of Theorem 15. ip . Without loss of generality we can assume tr(^) = 0. This is because if 
one considers A' = A — (tr(A)/n) • /, then x'^Ax — tv{A) = x'^A'x, and we have ||j4'||2 < ||^||2 and 
ll^'lloo < 2||j4||oo- We now start by proving our theorem for k a power of 2 by induction on k. For 
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k = 2, nix^Axf] = 4E.<,4. Plli = + 2E.<,4r Thus ^i^^ Axf] < 2\\A\\l 

Next we assume the statement of our Theorem for k/2 and attempt to prove it for k. 
We note that by Lemma 15.31 

^[\x'^Ax\^] <^[\x^Ax-y^Ay\^] = F:[\{x + y)'^A{x-y)\% 

where y G {—1, 1}" is random and independent of x. Notice that if we swap Xi with yi then x + y 
remains constant as does \xj — yj\ and that Xi — yi is replaced by its negation. Consider averaging 
over all such swaps. Let = {{x + y)'^ A)i and rji = Xi — y^. Let Zi be 1 if we did not swap and — 1 
if we did. Then (x + y)'^A{x — y) = J2i iiVi^i- Averaging over all swaps, 

/ \fc/2 / \k/2 

^.[\{x + yYA{x-yt]<{y^ihU ■ k^'' <2^k^l^ ■ {Y^in ■ 

The first inequality is by Lemma 15.21 and the second uses that |r/j| < 2. Note that 

Y,il = \\A{x + y)\\l<2\\Ax\\l + 2\\Ay\\l 

i 

and hence 

E[|x^Arc|'=] < 2''^/t'E,[{2\\Ax\\l + 2\\Ay\\lfl^] < A^^/t'E[{\\Ax\\lf/\ 

with the final inequality using Minkowski's inequality (namely that \E[\X+Y\p]\^/p < \B[\X\p]\^/p + 
|E[|y|P]|i/p for any random variables X, Y and any 1 < p < oo). 

Next note \\Ax\\l = {Ax, Ax) = x'^A^x. Let B = A^ - ^^^^L Then tr(B) = 0. Also, 
II-BII2 < IIAII2PII00 and ll^lloo < \\A\\l^. The former holds since 

The latter holds since the eigenvalues of B are A? — (X]j=iA^)/n for each i G [n]. The largest 
eigenvalue of B is thus at most that of A^, and since A? > 0, the smallest eigenvalue of B cannot 
be smaller than — 
We then have 

< 2'=max{P||^,E[|x^Sx|'=/2]}. 

Hence employing the inductive hypothesis on B we have that 

B[\x^Ax\''] < 8''ma^{Vk\\A\\2,C''/^k^/'^\\B\\2,C''/\^/^B^}'' 
< 8^C^/^ max{Vk\\A\\2, A;3/VP||2 PIU, k\\A\\oo}'' 
= 8^C^/^max{Vk\\A\\2,k\\A\\oo}'', 

with the final equality holding since the middle term above is the geometric mean of the other two, 
and thus is dominated by at least one of them. This proves our hypothesis as long as C > 64. 

To prove our statement for general k, set k' = 2^^"^'^ . Then by the power mean inequality and 
our results for k' a power of 2, £[1x^^x1^=] < (E[|x^^x|'='])'=/'=' < 128'=max{^/fcp||2,A;P||oo}^ ■ 



miAxWl)'/'] 



E 



\A 



1 2 + x'^Bx 



k/2 
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6 Fooling regular degree-2 threshold functions 

The main theorem of this section is the fohowing. 

Theorem 6.1. Let < e < 1 be given. Let Xi, . . . , Xn be independent Bernoulh and Yi, . . . , y„ 
be 2/c-wise independent Bernouhi for k a sufficiently large multiple of 1/e^. If p is multilinear and 
of degree 2 with J2\s\>oPs ~ ^' ^^"^ hifj(p) < r for all i, then 

E[sgn(p(X))] - E[sgn(p(y))] = 0{e + r^^). 

Throughout this section, p always refers to the polynomial of Theorem 16. 1|, and r refers to the 
maximum influence of any variable in p. Observe p (over the hypercube) can be written as (7+P4+C, 
where q is a multilinear quadratic form, ^4 is a linear form, and C is a constant. Furthermore, 
II II 2 < 1/2 and ^^Pas ^ 1- Using the spectral theorem for real symmetric matrices, we write 
p = Pi-P2+P3+P4: + C where Pi,P2,P3 are quadratic forms satisfying Amin(^pi), Amm(^p2) > 
ll^palloo < (5, and ||^pj|2 < 1/2 for 1 < i < 3, and also with pi,p2 positive semidefinite (see 
Lemma [B. 71 for details on how this is accomplished). Here Amin(^) denotes the smallest magnitude 
of a non-zero eigenvalue of A. Throughout this section we let pi, . . . ,^4, C, 5 be as discussed here. 
We use T to denote ti{Ap,^). The value 6 will be set later in the proof of Theorem 16.11 

Throughout this section it will be notationally convenient to define the map Mp : R" — ?• 
by Mpix) = i\/piix), \/ P2{3:) , Psix) — T,p4{x)). Note the the first two coordinates of Mp{x) are 
indeed always real since pi,p2 are positive semidefinite. 

Before giving the proof of Theorem 16.11 we first prove Lemma 16.31 which states that for F : 

— )• M, F{Mp{x)) is fooled by bounded independence as long as F is even in xi,X2 and certain 
technical conditions are satisfied. The proof of Lemma 16.31 invokes the following lemma, which 
follows from lemmas in the Appendix (specifically, by combining Lemma lA.61 and Lemma IB. 5p . 

Lemma 6.2. For a quadratic form / and random x G {—1, 1}", 

nf{xt] < 2^^=) • iWAfhk' + (P/||i/An.in(^/))'=). 

Lemma 6.3. Let e > be arbitrary. Let F : — )• M be even in each of its first two arguments 
such that ||(?'^-F'^||oo = 0{a^^^) for all multi-indices /3 € and some a > 1. Suppose 1/6 > Ba 
for a sufficiently large constant B. Let Xi, . . . ,X„ be independent Bernoulli, and Yi, . . . ,Yn be 
/c'-independent Bernoulli for k' = 2k with k > max{log{l/e), Ba/V^,Ba^} an even integer. Write 
X = {Xi, . . . and y = (Yi, . . . ,y„). Then |E[F(Mp(X))] - E[F(Mp(y))]| < e. 

Proof. We Taylor-expand F to obtain a polynomial Pk-i containing all monomials up to degree 
k — 1. Since F{x) is even in xi,X2, we can assume Pk~i is a polynomial in xf, X2, x^, x^. Let 
X G be arbitrary. We apply Taylor's theorem to bound R{x) = \F{x) — Pk-i{x)\. Define 
X* =maxj{|xj|}. Then 

^ ^ - /3i!-/32!-/33!-/34! 

\l3\=k 



a^x\ 



/3i!./32!-/33!-/34! 

\p\=k 

■-■y{ ' 
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with the absolute values unnecessary in the last inequality since k is even. We now observe 
\E[FiMp{X))]-E[F{Mp{Ym 
< a 



k^Oik) . E[(pi(X))^/^] + E[(p2(X))^/^] + E[(p3(X) - T)^] + E[(p4(X))^] 



ki" 

since (a) every term in Pk^i{Mp(X)) is a monomial of degree at most 2k — 2va the Xj, by evenness 
of Pk-i in xi,X2, and is thus determined by 2A:-independence, (b) ^Jpl{X), ^Jp2{X) are real by 
positive semidefiniteness of pi , p2 (note that we are only given that the high order partial derivatives 
are bounded by O(a^) on the reals; we have no guarantees for complex arguments), and (c) the 
moment expectations above are equal for X and Y since they are determined by 2/c-independence. 
We now bound the error term above. We have 

E[(pi(X))'=/2] = 20W(^>/2 + r'^/^) 
by Lemma 16.2^ with the same bound holding for 'E,[(j)2{X))^/'^]. We also have 

E[(p3(X) - T)'=] < 2^^=) • max{\/A?, {5k)f 
by Theorem 15.11 We finally have 

E[(P4(X))'] < k^'^ 

by Lemma 15.21 Thus in total, 

|E[F(Mp(X))] - E[F(Mp(y))]| < 2«W • {{a/y/kf + {a/{k^5)f + {aSf), 

which is at most e for sufficiently large B by our lower bounds on k and 1/5. ■ 

In proving Theorem 16. H we will need a lemma which states that p is anticoncentrated even 
when evaluated on Bernoulli random variables which are /c-wise independent. To show this, we 
make use of the following lemma, which follows from the Invariance Principle, the hypercontractive 
inequality, and the anticoncentration bound of |12] . The proof is in Section iDl 

Lemma 6.4. Let r], ?/ > 0, t G M be given, and let Xi, . . . , Xn be independent Bernoulli. Then 



Pr[|p(X) -t\<iT {./MX) + VMX) + 1) + v'] = 0(77 + ivV^)'^^ + r'/' + exp(-J](l/5))). 

We now prove our anticoncentration lemma in the case of limited independence. 

Lemma 6.5. Let e' be given. Suppose k > D/{e')'^ for a sufficiently large constant D > 0. Let 
Yi, . . . ,Yn be k-wise independent Bernoulli, and let t € R be arbitrary. Then 

Pr[\p{Y) -t\< e'] < 0{V? + T^^^). 

Proof. Define the region Tt^s' = {(xi, X2, X3, X4) : |xf — x^ + X3 + X4 + C + T — t| < e'}, and also 
the region Spte' = {x : d2{x,Tt£i) < p\ for /? > 0. Consider the FT-mollification 1% of 1$ , 

for c = A/ with A a large constant to be determined later. We note a few properties of ^ ^: 

i. Wd^h ^ ,l|oo< (20)1/^1 
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ii. ^^^ Jx)>IIt,,A^) 

iii. Ig ^(x) = max 1 1,0 ((c- ^2(2;, Tj^^/))"^) } for any x with d2{x,Tt^^/) > 2p 

P,t,£ 

Item (i) is straightforward from Theorem 14.81 For item (ii), note that if a; £ Tt^e'i then 
d2{x,dSp^t,e') > P, implying 

\i% ,(x)-l\ = 0^ ^ 



which is at most 1/2 for A a sufficiently large constant. Furthermore, 1% is nonnegative. Finally, 
for (iii), by Theorem 14. lUI we have 

/§^^^,(x) = max{l,O((c-(i2(x,5Sp,t,,0)~')} 

< max{l,0 ((c- (i2(x,5p,j,e/))~^)} 

< max {1, O ((c • (d2(x, T^,,,) - p))-^) } 

< max{l,O((c-(i2(x,rt,,0)'^)} 

with the last inequality using that d2{x,Tt^^i) > 2p. 

Noting Pr[\p{Z) — t| < e'] = E[/Tj ,{Mp{Z))] for any random variable Z = (Zi, . . . ,Zn), item 
(ii) tells us that 

Pr[|p(Z) -t\<e']<2- B[h^jMj,{Z))]. (6.2) 



We now proceed in two steps. We first show E[/e (M„(X))] = 0(\/e' + r-"^/^) by applications of 
Lemma |6.4[ We then show Ef/S (AdpCY))] = 0(\^ + r^/^) by applying Lemma 16.3^ at which 
point we will have proven our lemma via Eq. (j6.2p with Z = Y. 

E[i^^^ ^,(Mp(X))] = 0(\/e' + ri/"): We first observe that for x ^ Tt^^>, 
1 . (\x1-xl + X3 + X4 + C + T -t\-e' 



d2{x,Tt,e') > --nmW ^ ' ^ " ' J \xf - + X3 + x^ + C + T - t\ - e' } . 

2 [ 2(|2;i| + |x2| + 1) V J 

(6.3) 

This is because by adding a vector to x, we can change each individual coordinate of x by at 
most ||f II2, and can thus change the value of \xf — X2 + X3 + x^ + C + T — t\ — e' by at most 

211^112 • {\xi\ + \X2\ + 1) + llt'lli- 

Now let X G { — 1, 1}" be uniformly random. We thus have that, for any particular w > 0, 



Pr[0 < d2(Mp(X),rt,,0 < u;] < Pr 



mm ■ 



2{./MX) + VMX) + l) ' ' J " 



< Pr[\p{X) -t\<4w (VpI(^) + Vp2{X) + 1) + e'] 

+ Pr[\p{X) -t\< Aw^ + e'] 
= 0{V? + w + V^+ (u;V<5)^/^ + r^/^ + exp(-0(l/5))) 



with the last inequality holding by Lemma 167 
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Now, by item (iii), 
E[r5,,„,(Mp(X))] 

< Pr[d2{Mp{X),Tt,e') <2p] + " Pr[2V < d2{Mp{X),Tt,e') < 2^+V]j 

< OiV? +^+ {pV^f^ + r'/' + exp(-J7(l/5)) 

+ O ^ ^ 2-2« . (Vi^ + 2^^+^ + + (22^+W<^)^^^ + ^^^^ + exp(-0(l/5)))^ 

= 0(^/F + ^ + {p'/Sf^ + + exp(-fi(l/5)) (6.4) 
We now make the settings 

1 „ 2AB 



e')^ - = 2Bc 



5 p 

where -B > 1 is the sufficiently large constant in Lemma 16.31 Thus Eq. (j6.4p is now 0{\fe! + r^/^). 
(We remark that a different b is used when proving Theorem 16.11 ) 



Efe ,(Mp(Y))] = 0[^sJ7> ^t'^I^): It suffices to show 

E[/^^,,^,(Mp(y))] E[/^^^^^^,(Mp(X))]. 

We remark that 1% can be assumed to be even in both x\^X2- If not, then consider the 
symmetrization 

(-^S i ,(xi,X2,X3,X4)+/5 ^ ,(-Xi,X2,X3,X4)+/s ^ , (xi , -X2, ^3 , ^4) + ^ , (-Xi, -X2, X3, X4))/4, 

(6.5) 

which does not affect any of our properties (i),(ii), (iii). 

Now, by our choice of /c, b and item (i), we have by Lemma 16.31 (with a = 2c) that 

|E[/^^^^^^,(M,(X))] -E[/^^^^^^,(M,(y))]| <e'. 



This completes our proof by applying Eq. (j6.2p with Z = Y . ■ 

The following Corollary is proven similarly as Lemma 16.41 but uses anticoncentration under 
bounded independence (which we just proved in Lemma l6.5p . The proof is in Section [Pl 

Corollary 6.6. Let r/, 7/' > be given, and let Yi, . . . ,y„ be ^-independent Bernoulli for k as in 
Lemma 16.51 with e' = min{ri / ^/b , rj'} . Also assume A; > [2/(5]. Then 



Pr[|p(X) -t\<7]- {^/MX) + VMX) + 1) + r?'] = 0(7^+ (if/b)^/^ + r^/^ + exp(-J^(l/,5))). 

We are now ready to prove the main theorem of this section. 

Proof (of Theorem 16. ip . Consider the region C defined by -R = {(xi, X2, X3, X4) : xf — X2 + 
X3 + X4 + C + T > 0}. Then note that I[o^oD){pi^)) = 1 if and only ii Iji{Mp(x)) = 1. It thus suffices 
to show that Iff is fooled in expectation by bounded independence. 
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We set p = e"^, c = l/p, and 1/5 = 2Bc for B the constant in the statement of Lemma l6.3i We 
now show a chain of inequaUties to give our theorem: 

E[lR{Mp{X))] E[/^(Afp(X))] E[lUMpiY))] «,+,v9 E[/«(Mp(y))] 

E[Ir(Mp(X))] ~^.+^i/9 E[i^(Mp(X))] : Similarly to as in the proof of Lemma ESI 

d2{x,dR) > -•mm<^— ^ — -f — — — — — , J Ixf - + + + C + T\ } , 

and thus by Lemma 16.41 

Pr[d2iMp{X),dR) <w] < Pr[\p{X)\ < Aw ■ {^pi{X) + ^/p2{X) + 1)] + Pr[|p(X)| < 

= 0{w + ^+ {w'^/SYl^ + + exp(-rj(l/5))) 

Now, noting \E[I R{Mp{X))] - E[/^(Mp(X))]| < E[\lR{Mp{X))] - /^(Mp(X))|] and applying The- 
orem lTTOl 

|E[/^(Mp(X))]-E[/^(Mp(X))]| 

< ¥r[d2{Mp{X),dR) <2p\ + 2-2- • Pr[2V < d2{Mp{X),dR) < 2'+^p]j 

< 0{^p + {p'/Sf/^ + t1/9 + exp(-J7(l/5)) 

+ ^ 2"'' • (\/2^ + (22^+VV^)'/' + + exp(-0(l/5))) j 

by choice of p, 6 and applications of Lemma I6.4i 

E[i^(Mp(X))] E[i^(Mp(Y))] : As in Eq. (f53|) . we can assume is even in xi,X2. We apply 
Lemma 16.31 with a = 2c, noting that 1/5 = Ba and that our setting of k is sufficiently large. 

E[I^(Mp(Y))] ~j,_,_^i/9 E[lR,(Mp(Y))] : The argument is identical as with the first inequality, 
except that we use Corollary 16.61 instead of Lemma 16.41 We remark that we do have sufficient 
independence to apply Corollary 16.61 since, mimicking our analysis of the first inequality, we have 

Vt[\p{Y)\ < 4p ■ ( + Vm(y) + 1)] + Pr[\p{Y)\ < 4p'] 

< Pr[|p(y)| < 4p • {Vpi{Y) + ^/p2{Y) + 1)] + Pr[\p{Y)\ < e^] (6.6) 

since p'^ = o(e^) (we only changed the second summand). To apply Corollarv 16.61 to Eq. (j6.6p . we 
need k > [2/5], which is true, and k = J7(l/(e")^), for e" = min{p/\/J, e^} = which is also 
true. Corollarv 16.61 then tells us Eq. (|6.6p is 0{e + r^/^). ■ 

Our main theorem of this Section (Theorem 16. ip also holds under the case that the Xi , Yi are 
standard normal, and without any error term depending on r. We give a proof in Section [D.21 by 
reducing back to the Bernoulli case. 
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7 Reduction to the regular case 

In this section, we complete the proof of Theorem ll.il We accomphsh this by providing a reduction 
from the general case to the regular case. In fact, such a reduction can be shown to hold for any 
degree d > 1 and establishes the following: 

Theorem 7.1. Suppose i^^-wise independence e-fools the class of r-regular degree-d PTF's, for 
some parameter < r < e. Then {Kd + Lrf)-wise independence e-fools all degree-d PTFs, where 



Noting that r-regularity implies that the maximum influence of any particular variable is at 
most d - r, Theorem 16.11 implies that degree-2 PTF's that are r-regular, for r = O(e^), are e-fooled 
by -K'2-wise independence for K2 = 0{e~^) = poly(l/e). By plugging in r = 0{e^) in the above 
theorem we obtain Theorem ll.li The proof of Theorem 17.11 is based on recent machinery from 
|15pl . Here we give a sketch, with full details in Section |E1 




Proof (Sketch), (of Theorem 17. ip . Any boolean function / on {—1,1}" can be expressed as 
a binary decision tree where each internal node is labeled by a variable, every root-to-leaf path 
corresponds to a restriction p that fixes the variables as they are set on the path, and every leaf is 
labeled with the restricted subfunction fp. The main claim is that, if / is a degree-d PTF, then it has 
such a decision-tree representation with certain strong properties. In particular, given an arbitrary 
degree-d PTF / = sgn(p), by [15] there exists a decision tree T of depth {^/t) - (dlog(l/T))'^^'^\ 
so that with probability 1 — r over the choice of a random root-to-leaf patio p, the restricted 
subfunction (leaf) fp = sgn(pp) is either a r-regular degree-d PTF or r-close to a constant function. 

Our proof of Theorem 17.11 is based on the above structural lemma. Under the uniform distri- 
bution, there is some particular distribution on the leaves (the tree is not of uniform height); then 
conditioned on the restricted variables the variables still undetermined at the leaf are still uniform. 
With {Kd + Lc;)-wise independence, a random walk down the tree arrives at each leaf with the same 
probability as in the uniform case (since the depth of the tree is at most L^). Hence, the probability 
mass of the "bad" leaves is at most t < e even under bounded independence. Furthermore, the 
induced distribution on each leaf (over the unrestricted variables) is K^-wise independent. Consider 
a good leaf. Either the leaf is r-regular, in which case we can apply Theorem 16. H or it is r-close 
to a constant function. At this point though we arrive at a technical issue. The statement and 
proof in [15] concerning "close-to-constant" leaves holds only under the uniform distribution. For 
our result, we need a stronger statement that holds under any distribution (on the variables that 
do not appear in the path) that has sufficiently large independence. By simple modifications of the 
proof in [15], we show that the statement holds even under 0{d ■ log(l/r))-wise independence. ■ 

8 Fooling intersections of threshold functions 

Our approach also implies that the intersection of halfspaces (or even degree-2 threshold functions) 
is fooled by bounded independence. While Theorem ID. II implies that r2(e~®)-wise independence 
fools GW rounding, we can do much better by noting that to fool GW rounding it suffices to fool 
the intersection of two halfspaces under the Gaussian measure. 

This is because in the GW rounding scheme for MaxCut, each vertex u is first mapped to 
a vector Xu of unit norm, and the side of a bipartition u is placed in is decided by sgn{{xu,r)) 

*We note that [HDj uses a similar approach to obtain their PRCs for degree-d PTF's. Their methods are not 
directly applicable in our setting, one reason being that that their notion of "regularity" is different from ours, 
"random root-to-leaf path" corresponds to the standard uniform random walk on the tree. 



Ld = (1/r) • (dlog(l/r 



)) 



0{d) 
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for a random Gaussian vector r. For a vertex u, let be the halfspace {xu,r) > 0, and let 
H~ be the halfspace {—Xu,r) > 0. Then note that the edge {u,v) is cut if and only if r G 
{H^ n H~) U {H~ n -ff^), i.e. r must be in the union of the intersection of two halfspaces. Thus if 
we define the region i?"*" to be the topright quadrant of M?, and R~ to be the bottom left quadrant 
of M^, then we are interested in fooling 

WB+uB.-{{xu,r) ,{-Xy,r))] = B[Iji+{{xu,r) ,{-x^,r))]+'E[Iji+{{-Xu,r) ,{xy,r))], 

since the sum of such expectations over all edges {u, v) gives us the expected number of edges 
that are cut (note equality holds above since the two halfspace intersections are disjoint). The 
following theorem then implies that to achieve a maximum cut within a factor .878... — e of optimal 
in expectation, it suffices that the entries of the random normal vector r have entries that are 
il(l/e^)-wise independent. The proof of the theorem is in Section iFl 

Theorem 8.1. Let Hi = {x : (a, x) > 6i} and H2 = {x : {b,x) > 62} be two halfspaces, with 
ll^lb = II&II2 = 1. Let X,Y be n-dimensional vectors of standard normals with the Xi independent 
and the Yi k-wise independent for k = n{l/e^). Then |Pr[X G i?i n H2] -Fr[Y e HiD H2]\ < e. 

The proof of Theorem 18.1 l ean be summarized in one sentence: FT-mollify the indicator function 
of {x : xi > Oi^X2 > O2} C M^. We also in Section [F] discuss how our proof of Theorem 18.11 easilv 
generalizes to handle the intersection of m halfspaces, or even m degree-2 PTF's, for any constant 
m, as well as generalizations to case that X, Y are Bernoulli vectors as opposed to Gaussian. Our 
dependence on m in all cases is polynomial. 

Acknowledgment s 

We thank Piotr Indyk and Rocco Servedio for comments that improved the presentation of this 
work. We also thank Ryan O'Donnell for bringing our attention to the problem of the intersection 
of threshold functions. 

References 

[1] Noga Alon, Laszlo Babai, and Alon Itai. A fast and simple randomized parallel algorithm for 
the maximal independent set problem. J. Algorithms, 7(4):567-583, 1986. 

[2] James Aspnes, Richard Beigel, Merrick L. Furst, and Steven Rudich. The expressive power of 
voting polynomials. Combinatorica, 14(2): 1-14, 1994. 

[3] Per Austrin and Johan Hastad. Randomly supported independence and resistance. In Proceed- 
ings of the 4ist Annual ACM Symposium on Theory of Computing (STOC), pages 483-492, 
2009. 

[4] Louay Bazzi. Polylogarithmic independence can fool DNF formulas. In Proceedings of the 48th 
Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 63-73, 2007. 

[5] William Beckner. Inequalities in Fourier analysis. Annals of Mathematics, 102(1):159-182, 
1975. 

[6] Richard Beigel. Perceptrons, PP, and the Polynomial Hierarchy. Computational Complexity, 
4:339-349, 1994. 



17 



[7] Itai Benjamini, Ori Gurel-Gurevich, and Ron Peled. On fc-wise independent distributions and 



boolean functions. Available at |http : / / www . wisdom . weizmann . ac . il/~origurel/ 2007 



[8] Aline Bonami. Etude des coefficients de Fourier des fonctions de U'{G). Ann. Inst. Fourier, 
20:335-402, 1970. 

[9] Mark Braverman. Poly-logarithmic independence fools AC'' circuits. In Proceedings of the 
24th Annual IEEE Conference on Computational Complexity (CCC), pages 3~8, 2009. 

[10] Jehoshua Bruck. Harmonic analysis of polynomial threshold functions. SIAM J. Discrete 
Math., 3(2): 168-177, 1990. 

[11] Jehoshua Bruck and Roman Smolensky. Polynomial threshold functions, AC'' functions and 
spectral norms. SIAM J. Comput, 21(l):33-42, 1992. 

[12] Anthony Carbery and James Wright. Distributional and L'' norm inequalities for polynomials 
over convex bodies in M". Mathematical Research Letters, 8(3):233-248, 2001. 

[13] Benny Chor and Oded Goldreich. On the power of two-point based sampling. Journal of 
Complexity, 5(1):96-106, March 1989. 

[14] Bias Diakonikolas, Parikshit Gopalan, Ragesh Jaiswal, Rocco A. Servedio, and Emanuele Viola. 
Bounded independence fools halfspaces. In Proceedings of the 50th Annual IEEE Symposium 
on Foundations of Computer Science (FOCS), pages 171-180, 2009. 

[15] Bias Diakonikolas, Rocco A. Servedio, Li- Yang Tan, and Andrew Wan. A regularity 
lemma, and low-weight approximators, for low-degree polynomial threshold functions. CoRR, 
abs/0909.4727, 2009. 

[16] Gerald B. Folland. How to integrate a polynomial over a sphere. Amer. Math. Monthly, 
108(5) :446-448, 2001. 

[17] Kurt Otto Friedrichs. The identity of weak and strong extensions of differential operators. 
Transactions of the American Mathematical Society, 55(1):132-151, 1944. 

[18] Michel X. Goemans and David P. Williamson. Improved approximation algorithms for maxi- 
mum cut and satisfiability problems using semidefinite programming. J. ACM, 42:1115-1145, 
1995. 

[19] Mikael Goldmann, Johan Hastad, and Alexander A. Razborov. Majority gates vs. general 
weighted threshold gates. Computational Complexity, 2:277-300, 1992. 

[20] Parikshit Gopalan, Ryan O'Donnell, Yi Wu, and David Zuckerman. Fooling functions of 
halfspaces under product distributions. CoRR, abs/1001.1593, 2010. 

[21] Uffe Haagerup. The best constants in the Khintchine inequality. Studia Math., 70(3):231-283, 
1982. 

[22] Andras Hajnal, Wolfgang Maass, Pavel Pudlak, Mario Szegedy, and Gyorgy Turan. Threshold 
circuits of bounded depth. J. Comput. Syst. Sci., 46:129-154, 1993. 



18 



[23] Prahladh Harsha, Adam Klivans, and Raghu Meka. An invariance principle for polytopes. In 

Proceedings of the 42nd Annual ACM Symposium on Theory of Computing (STOC), to appear 
(see also CoRR abs/0912.4884), 2010. 

[24] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream 
computation. J. ACM, 53(3):307-323, 2006. 

[25] Daniel M. Kane, Jelani Nelson, and David P. Woodruff. On the exact space complexity of 
sketching and streaming small norms. In Proceedings of the 21st Annual ACM-SIAM Sympo- 
sium on Discrete Algorithms (SODA), pages 1161-1178, 2010. 

[26] Adam R. Klivans, Ryan O'Donnell, and Rocco A. Servedio. Learning intersections and thresh- 
olds of halfspaces. J. Comput. Syst. Sci., 68(4):808~840, 2004. 

[27] Adam R. Klivans and Rocco A. Servedio. Learning DNF in time 2'-'^'^^^^\ J. Comput. Syst. 
Sci., 68(2):303-318, 2004. 

[28] Matthias Krause and Pavel Pudlak. Computing boolean functions by polynomials and thresh- 
old circuits. Computational Complexity, 7(4):346-370, 1998. 

[29] Sanjeev Mahajan and Ramesh Hariharan. Derandomizing semidefinite programming based 
approximation algorithms. In Proceedings of the 36th Symposium on Foundations of Computer 
Science (FOCS), pages 162-169, 1995. 

[30] Raghu Meka and David Zuckerman. Pseudorandom generators for polynomial threshold func- 
tions. In Proceedings of the 42nd Annual ACM Symposium on Theory of Computing (STOC), 
to appear (see also CoRR abs/0910.4122), 2010. 

[31] Marvin A. Minsky and Seymour L. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969 
(expanded edition 1988). 

[32] Elchanan Mossel, Ryan O'Donnell, and Krzysztof Oleszkiewicz. Noise stability of functions 
with low influences: invariance and optimality. Annals of Mathematics (to appear), 2010. 

[33] Noam Nisan. Pseudorandom bits for constant depth circuits. Combinatorica, ll(l):63-70, 
1991. 

[34] Noam Nisan. The communication complexity of threshold gates. In Proceedings of Combina- 
torics, Paul Erdds is Eighty, pages 301-315, 1994. 

[35] Ryan O'Donnell and Rocco A. Servedio. Extremal properties of polynomial threshold functions. 
J. Comput. Syst. Sci., 74(3):298-312, 2008. 

[36] Alexander A. Razborov. A simple proof of Bazzi's theorem. ACM Transactions on Computa- 
tion Theory, 1(1), 2009. 

[37] Alexander A. Razborov and Alexander A. Sherstov. The sign-rank of AC^. In Proceedings of 
the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 57-66, 
2008. 



19 



[38] Michael E. Saks. Slicing the hypercube, pages 211-257. London Mathematical Society Lecture 
Note Series 187, 1993. 

[39] Alexander A. Sherstov. The intersection of two halfspaces has high threshold degree. In Pro- 
ceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 
2009. 

[40] D. Sivakumar. Algorithmic derandomization via complexity theory. In Proceedings of the 34th 
Annual ACM Symposium on Theory of Computing (STOC), pages 619-626, 2002. 

[41] Gilbert Strang. Introduction to linear Algebra. Wellesley-Cambridge Press, 4th edition, 2009. 

[42] Peter Whittle. Bounds for the moments of linear and quadratic forms in independent variables. 
Theory Probab. AppL, 5(3):302-305, 1960. 



20 



A Basic linear algebra facts 

In this subsection we record some basic linear algebraic facts used in our proofs. 

We start with two elementary facts. 
Fact A.l. If A, P G R"^"- with P invertible, then the eigenvalues of A and P^^AP are identical. 
Fact A. 2. For A G M"^*^ with eigenvalues Ai, . . . , A„, and for integer > 0, ir{A'^) = A^^. 

Note Fact lA.ll and Fact IA.2I imply the following. 
Fact A. 3. For a real matrix A G W^^"^ and invertible matrix P G M"^", 

\\P^^AP\\2 = \\A\\2. 

The following standard result will be useful: 

Theorem A. 4 (Spectral Theorem |4H Section 6.4]). If A G M"^" is symmetric, there exists an 
orthogonal Q G M"^" with A = Q^AQ diagonal. In particular, all eigenvalues of A are real. 

Definition A. 5. For a real symmetric matrix A, we define Aniin(^) to be the smallest magnitude 
of a non-zero eigenvalue of A (in the case that all eigenvalues are 0, we set Aniin(^) = 0). We define 
ll^lloo to be the largest magnitude of an eigenvalue of A. 

We now give a simple lemma that gives an upper bound on the magnitude of the trace of a 
symmetric matrix with positive eigenvalues. 

Lemma A. 6. Let A G M'"'"^ be symmetric with \min{A) > 0. Then |tr(A)| < || A|||/Amin(A). 
Proof. We have 



MA)\ 



< 



i=l 
\\A\\ 2 

Ati 



I4I|2 
1^112 



Amin(^) 



We note Yli=i "^1 — W^Wh implying the final equality. Also, there are at most 
non-zero Aj. The sole inequality then follows by Cauchy-Schwarz. 



|^||l/(A^in(A))2 



B Useful facts about polynomials 

B.l Facts about low-degree polynomials. We view { — 1, 1}" as a probability space endowed 
with the uniform probability measure. For a function / : { — l,!}*^ — )• M and r > 1, we let 
denote (E,[|/(x)r])^/^ 

Our first fact is a consequence of the well-known hypercontractivity theorem. 

Theorem B.l (Hypercontractivity [5l[8]). If / is a degree-d polynomial and 1 < r < g < oo, 



< 







r — 


1 



Our second fact is an anticoncentration theorem for low-degree polynomials over independent 
standard Gaussian random variables. 
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Theorem B.2 (Gaussian Anticoncentration [H]). For / a non-zero, n-variate, degree-d polyno- 
mial, 

Pr[|/(Gi, ...,Gn)-t\<e- Var[/]] = 0{de'/'') 

for all £ £ (0, 1) and t G M. Here Gi, . . . , Gn ~ A/'(0, 1) are independent. (Here, and henceforth, 
Af{ii,cr'^) denotes the Gaussian distribution with mean // and variance cj^.) 

The following is a statement of the Invariance Principle of Mossell, O'Donnell, and Oleszkiewicz 
|32j . in the special case when the random variables Xi are Bernoulli. 

Theorem B.3 (Invariance Principle [32])- Let Xi, . . . ,X„ be independent ±1 Bernoulli, and let 
p be a degree-d multilinear polynomial with "^^siyoPs ~ ^ ^^'^ maxjlnfi(p) < r. Then 

sup|Pr[p(Xi,...,XO <t]-Pr[p(Gi,...,G„) <t]| = 0(dTi/(^'^+^)) 
t 

where the Gi ^ M{0, 1) are independent. 

The following tail bound argument is standard (see for example [3]). We repeat the argument 
here just to point out that only bounded independence is required. 

Theorem B.4 (Tail bound). If / is a degree-d polynomial, t > 8*^/^, and X is drawn at random 
from a {dt'^ ^^)-wise independent distribution over {—1, 1}", then 

Pr[|/(X)|>t||/||2]=exp(-17(dt2/'^)). 

Proof. Suppose k > 2. By Theorem lB.il 

E[|/(X)|^]<A;*/2. 11/11^, 

implying 

Pr[\f{X)\>t\\fh]<{k'/'/t)'' (B.l) 

by Markov's inequality. Set k = 2 - [t^/'^/4j and note A; > 2 as long as t > 8^^^. Now the right hand 
side of Eq. (jB.ip is at most 2~'^^/'^ , as desired. Finally, note independence was only used to bound 
E[|/(X)|'^], which for k even equals E[/(X)*^] and is thus determined by d/c-independence. ■ 

B.2 Facts about quadratic forms. The following facts are concerned with quadratic forms, 
i.e. polynomials p{x) = J2i<j ^ij^i^j- We often represent a quadratic form p by its associated 
symmetric matrix Ap, where 

{ai,j/2, i < j 
aj,i/2, i > j 
ai,j, i=j 

so that p{x) = x^ApX. 

The following is a bound on moments for quadratic forms. 

Lemma B.5. Let f{x) be a degree-2 polynomial. Then, for X = {Xi,...,Xn) a vector of 
independent Bernoullis, 

E[\f{Xt] <20('^\\\Afhk'' + MAft). 
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Proof. Over the hypercube we can write f = q + tr(Af) where q is multihnear. Note ||^g||2 < 
||^j||2. Then by Theorem lB.il 



E[|/(x)|'=] = B[\q{x)+tr{Aft] 
k 

< 5;(P/||2-ir|tr(A;)| 



j=0 
k 

= 20Wmax{p/||2-A;,|tr(^/)|}'= 



The fohowing corollary now follows from Theorem IB. 41 and Lemma lA.61 

Corollary B.6. Let / be a quadratic form with Af positive semidefinite, ||^/||2 ^ 1) and 
Amin(^/) > S for some 6 G (0, 1]. Then, for x chosen at random from a [2/5] -independent family 
over {—1, 1}", 

Pr[/(a;) >2/5] =exp(-^](l/5)). 
Proof. Write f = g + C via Lemma fA.GI with < C < 1/5 and g multilinear, ||Ag||2 < ll^/lb < 1- 



Apply Theorem IB. 41 to g with t = 1/5. ■ 
The following lemma gives a decomposition of any multi-linear quadratic form as a sum of 
quadratic forms with special properties for the associated matrices. It is used in the proof of 
Theorem 16. 1[ 

Lemma B.7. Let 5 > be given. Let / be a multilinear quadratic form. Then / can be written 
as /i — /2 + /s for quadratic forms /i, /2, /s where: 

1. Af-^,Af^ are positive semidefinite with Amin(^/i)) Amin(^/2) ^ ^■ 

2. \\Afg lloo < 5. 

3- ll^/ilb, P/2II2, P/3II2 < ||^/||2- 

Proof. Since ^/ is real and symmetric, we can find an orthogonal matrix Q such that A = Q^AjQ 
is diagonal. Each diagonal entry of A is either at least 5, at most —5, or in between. We create a 
matrix P containing all entries of A which are at least 5, with the others zeroed out. We similarly 
create to have all entries at most —5. We place the remaining entries in R. We then set 
Af^ = QPQ'^,Af^ = QNQ'^,Af^ = QRQ'^. Note ||A||| = \\Af\\l by Fact |A^ so since we remove 
terms from A form each Af., their Frobenius norms can only shrink. The eigenvalue bounds hold 
by construction and Fact lA.li ■ 

C Why the previous approaches failed 

In this section, we attempt to provide an explanation as to why the approaches of jl4j and [25] fail 
to fool degree-2 PTFs. 
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C.l Why the approximation theory approach failed The analysis in [T3] crucially exploits 
the strong concentration and anti-concentration properties of the gaussian distribution. (Recall that 
in the linear regular case, the random variable {w,x) is approximately Gaussian.) Now consider a 
regular degree-2 polynomial p and the corresponding PTF / = sgn(p). Since p is regular, it has still 
has "good" concentration and anti-concentration properties - though quantitatively inferior than 
those of the Gaussian. Hence, one would hope to argue as follows: use the univariate polynomial 
P (constructed using approximation theory) , allowing its degree to increase if necessary, and carry 
out the analysis of the error as in the linear case. 

The reason this fails is because the (tight) concentration properties of p - as implied by hyper- 
contractivity - are not sufficient for the analysis to bound the error of the approximation, even if 
we let the degree of the polynomial P tend to infinity. (Paradoxically, the error coming from the 
worst-case analysis becomes worse as the degree of P increases.) 

Without going into further details, we mention that an additional problem for univariate ap- 
proximations to work is this: the (tight) anti-concentration properties of p - obtained via the 
Invariance Principle and the anti-concentration bounds of [12] - are quantitatively weaker than 
what is required to bound the error, even in the region where P has small point- wise error (from 
the sgn function). 

C. 2 Why the analysis for univariate FT-mollification failed We discuss why the argument 
in [25] failed to generalize to higher degree. Recall that the argument was via the following chain 
of inequalities: 

E[/[o,oo)(p(^))] E[/fo^^)(p(X))] P., E[/~'o^^)(p(y))] E[I[o,ooMY))] (C.l) 

The step that fails for high-degree PTFs is the second inequality in Eq. (jC.ip . which was argued 
by Taylor's theorem. Our bounds on derivatives of -^[qoo)' FT-mollification of -^[o,oo) ^ 
certain parameter c = c(e) to make sure |/[o,oo) ~ -^[ooo)l ^ ^ "almost everywhere", are such that 
||(/p ^P'^'^^lloo > 1 for all k. Thus, we have that the error term from Taylor's theorem is at least 
E[(p(X))'^]/A;!. The problem comes from the numerator. Since we can assume the sum of squared 
coefficients of p is 1 (note the sgn function is invariant to scaling of its argument), known (and 
tight) moment bounds (via hypercontractivity) only give us an upper bound on E[(p(x))'^] which 
is larger than /c*/^, where degree(p) = d. Thus, the error from Taylor's theorem does not decrease 
to zero by increasing k for d > 2, since we only are able to divide hy k\ < k'' (in fact, strangely, 
increasing the amount of independence k worsens this bound). 

D Proofs omitted from Section [6] 

D. l Boolean setting. We next give a proof of Lemma 16.41 where pi,P2,S are as in Section [6] 
(recall p = pi — P2 + + ^4 + C* where pi , p2 are positive semidefinite with minimum non-zero 
eigenvalues at least 6). 

Lemma 16.41 (restatement). Let 7],7]' > 0,t G be given, and let Xi, . . . , Xn be independent 
Bernoulli. Then 

Pr[\p{X) -t\<7]- iy^MX) + VMX) + 1) + r]'] = 0{y^ + {if/Sf^ + r^/^ + exp(-J](l/5))). 
Proof. Applying Corollarv lB.6l we have 

Pr[Vpi(X) > 7275] = exp(-0(l/<5)), 
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and similarly for ^Jp2{X). We can thus bound our desired probability by 

Pr[|p(X) -t\< 2r]^/2/5 + r? + ??'] + exp(-J](l/(5)). 

By Theorem IB. 2 1 together with Theorem IB .31 we can bound the probability in the lemma statement 
by 

0{^+{if/5Y'^ + T^/^ + exp(-J](l/5))). 

■ 

Corollary 16.61 (restatement). Let rj,?]' > be given, and let Yi, . . . ,Yn be k-independent 
Bernoulli for k as in Lemma \6.5\ with e' = min{ri/^/6, r/'}. Also assume k > [2/(5] . Then 

Fr[\p{X) -t\<r]- iy^MX) + VM^) + 1) + = 0( + {r]^/5)^/^ + r^/^ + exp(-J7(l/5))). 

Proof. There were two steps in the proof of Lemma 16.41 which required using the independence 
of the Xi. The first was in the application of Corollary IB .61 but that only required [2/5] -wise 
independence, which is satisfied here. The next was in using the anticoncentration of p{X) (the 
fact that Pr[|p(X) —t\ < s] = 0{y/s + t^^^) for any t G R and s > 0). However, given Lemma |6.5| 
anticoncentration still holds under /c-independence. ■ 

D.2 Gaussian Setting In the following Theorem we show that the conclusion of Theorem 16.11 
holds even under the Gaussian measure. 

Theorem D.l. Let < e < 1 be given. Let G = (Gi,...,G„) be a vector of independent 
standard normal random variables, and G' = {G'l, . . . ,G'^) be a vector of 2A:-wise independent 
standard normal random variables for k a sufficiently large multiple of 1/e^. lip{x) = X^j<j aijXiXj 
has ^i<j- alj = 1, 

E[sgn(p(G))] - E[sgn(p(G'))] = 0{e). 

Proof. Our proof is by a reduction to the Bernoulli case, followed by an application of Theorem 16. 11 
We replace each Gj with Zi = Ylf=i Xij/VX for a sufficiently large N to be determined later. We 
also replace each G- with Z'- = J2f=i Yij/V^- We determine these Xij,Yij as follows. Let $ : M — )• 
[0, 1] be the cumulative distribution function (CDF) of the standard normal. Define T-i^n = — oo, 
Tn,n = oo, and tv = ^~^{2~^ Sj=o Ck)) ^oic < k < N. Now, after a Gj is chosen according 
to a standard normal distribution, we identify the unique ki such that T^..! tv < Gi < T^. tv- We 
then randomly select a subset of ki of the Xi^j to make 1, and we set the others to —1. The l^j 
are defined similarly. It should be noted that the are Bernoulli random variables, with 

the Xij being independent and the Yij being 2fc-wise independent. Furthermore, we define the 
niV-variate polynomial p' : {—1, 1}"^ — M to be the one obtained from this procedure, so that 
p{G) = p'{X). We then define p"{x) = a ■ p'{x) for a = + (1 - 1/N) ^. J-i so that 

the sum of squared coefficients in p" (ignoring constant terms, some of which arise because the x?^ 
terms are 1 on the hypercube) is 1. It should be observed that 1 < a < 1 + 1/{N — 1). 
Now, we make the setting e = log^^^{N)/\/lV. By the Chernoff bound, 

Pr[|/cj - N/2\ > eN/2] = o(l) as N grows. (D.l) 

Claim D.2. If (1 - e)N/2 < ki < {1 + e)N/2, then \Tk^,N - 7fc,+i,7v| = o(l). 
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Before proving the claim, we show how now we can use it to prove our Theorem. We argue by 
the following chain of inequalities: 

E[sgn(p(G))] E[sgn(/(X))] E[sgn(/(y))] E[sgn(p(G'))]. 

E[sgn(p(G))] E[sgn(p"(X))] : First we condition on the event £ that \Zi — Gi\ < jr? for aU 
i G [n]; this happens with probability 1 — o(l) as grows by coupling Claim [P^ and Eq. (jD.l[) . and 
applying a union bound over all i G [n]. We also condition on the event E' that = 0(Y^log(n/e)) 
for all i G [n], which happens with probability \ — e^ by a union bound over i G [n] since a standard 
normal random variable has probability e~^*-^ ^ of being larger than x in absolute value. Now, 
conditioned on E^E' ^ we have 

\p(G)-^'{X)\<n\e'lr?f^{e-ln^)Y,\G,\ W^wA < + (^3/^2^ . 0( ^biRi)) • ^ |ai,,-|. 

« \ 3 / i,j 

We note Ylij'^lj — 1; ^'^'^ t\ms ^ by Cauchy-Schwarz. We thus have that \p'{X) — 

p{G)\ < with probability at least 1 - and thus \p"{X) - p{G)\ < + \{a - 1) ■ p{X)\ with 
probability at least 1 — e^. We finally condition on the event E" that |(a — 1) ■p'{X)\ < . Since 
p' can be written as a multilinear quadratic form with sum of squared coefficients at most 1, plus 
its trace tr(^p/) (which is < y/n, by Cauchy-Schwarz), we have 

Pr[|(a - 1) ■p'{X)\ > a < Pr[\p'{X)\ > ■ {N - 1)] = o(l), 

which for large enough N and the fact that ||p'||2 = 0(1 + tr(^p')) irrespective of N, is at most 

Pr[|p'(X)| >c-log(l/e)||y||2], 

for a constant c we can make arbitrarily large by increasing N. We thus have Pr[if"] > 1 — 
by Theorem IB. 41 Now, conditioned on E A E' A E" , sgn(p" (X)) ^ sgn(p(G)) can only occur if 
|y(X)| = 0{e'^). However, by anticoncentration (Theorem IB.2P and the Invariance Principle 
(Theorem lR3]l . this occurs with probability 0(e) for N sufficiently large (note the maximum 
influence of p" goes to as — )• oo). 

E[sgn(p"(X))] ~£ E[sgn(p"(Y))] : Since the maximum influence r of any Xij in p" approaches 
as N ^ oo, we can apply Theorem 16.11 for sufficiently large (and thus r sufficiently small). 

E[sgn(p"(Y))] We E[sgn(p(G'))] : This case is argued identically as in the first inequality, except 
that we use anticoncentration oi p"{Y), which follows from Lemma 16.51 and we should ensure that 
we have sufficient independence to apply Theorem IB . 41 with t = 0(log(l/e)), which we do. 
Proof (of Claim ID.2p . The claim is argued by showing that for ki sufficiently close to its ex- 
pectation (which is N/2), the density function of the Gaussian (i.e. the derivative of its CDF) 
is sufficiently large that the distance we must move from T^^^n to Tk^+i^N to change the CDF 
by e(l/\/iV) > is smaU. We argue the case (1 - e)N/2 < ki < N/2 since the case 

N/2 < ki < (1 + e)-/V/2 is argued symmetrically. Also, we consider only the case ki = (1 — e)N/2 
exactly, since the magnitude of the standard normal density function is smallest in this case. 
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Observe that each Zi is a degree-1 polynomial in the with maximum influence and 
thus by the Berry-Esseen Theorem, 



sup \Vv\Zi <t\- Pr[Gi < t]| < 



1 



N 



Also note that 



by construction. We thus have 



Z, < 



N 



Pr[G, <rfc,,^] = Pr 

By a similar argument we also have 

Pr[Gi < Tfc,+i,7v] = Pr 



2ki 



Pr[Gi < logi/3(iV)] ± ^ 



< logi/3(iV) + 



Note though for t = G(log^/'^(A)), the density function / of the standard normal satisfies f{t) = 
e~* = N~°^^\ Thus, in this regime we can change the CDF by 0(l/-v/iV) by moving only 
iVo(i)/^]V = 0(1) along the real axis, implying Tk.+i M - T^-n = o{l). ■ 



E Proofs from Section [7] 

E.l Proof of Theorem 17.11 We begin by stating the following structural lemma: 

Theorem E.l. Let f{x) = sgn(p(x)) be any degree-c? PTF. Fix any r > 0. Then / is equivalent 

to a decision tree T of depth depth((i, r) '= (l/r) • ((ilog(l/r))'^^'^^ with variables at the internal 
nodes and a degree-d PTF fp = sgn(pp) at each leaf p, with the following property: with probability 
at least 1 — r, a random path from the root reaches a leaf p such that either: (i) fp is r-regular 
degree-d PTF, or (ii) For any 0{d- log(l/T))-independent distribution V over {— there 
exists h € {—1, 1} such that Pr^^i?' [/p(2;) 7^ ^] < 

We now prove Theorem 17.11 assuming Theorem lE.il We will need some notation. Consider a leaf 
of the tree T. We will denote by p both the set of variables that appear on the corresponding root- 
to-leaf path and the corresponding partial assignment; the distinction will be clear from context. 
Let \p\ be the number of variables on the path. We identify a leaf p with the corresponding restricted 
subfunction fp = sgn(pp). We call a leaf "good" if it corresponds to either a r-regular PTF or to a 
"close-to constant" function. We call a leaf "bad" otherwise. We denote by L(T), GL{T), BL{T) 
the sets of leaves, good leaves and bad leaves of T respectively. 

In the course of the proof we make repeated use of the following standard fact: 

Fact E.2. Let P be a fc-wise independent distribution over {—1,1}'^. Condition on any fixed 
values for any t < k bits of D, and let D' be the projection of P on the other n — t bits. Then D' 
is {k — t)-wise independent. 
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Throughout the proof, V denotes a {Kd + Lrf)-wise independent distribution over {—1,1}". 
Consider a random walk on the tree T. Let LD(T,T>) (resp. LD{T ^lA)) be the leaf that the 
random walk will reach when the inputs are drawn from the distribution T) (resp. the uniform 
distribution) . The following straightforward lemma quantifies the intuition that these distributions 
are the same. This holds because the tree has small depth and T) has sufficient independence. 

Lemma E.3. For any leaf p G L{T) we have Y'r[LD{T, V) = p] = Y'r[LD{TM) = p] ■ 

The following lemma says that, if p is a good leaf, the distribution induced by P on p 0(e)-fools 
the restricted subfunction fp. 

Lemma E.4. Let p G GL(T) be a good leaf and consider the projection ^^[n]\p of T> on the 
variables not in p. Then we have |Prj:^D,^j^J/p(x) = 1] - Vvy^Uy^^^Jifpiy) = 1]| < 2e. 
Proof. If fp is r-regular, by Fact IE. 2] and recalling that \p\ < depth(d, r) < L^, the distribution 
T^[n]\p is -fCrf-wise independent. Hence, the statement follows by assumption. Otherwise, fp is e- 
close to a constant, i.e. there exists b G {—1, 1} so that for any t = 0(dlog(l/r))-wise distribution 
V over {-1, 1}"'"!''! we have Pr^^D'ifpix) / 6] < r (*). Since » t, Fact IE. 21 imphes that 
(*) holds both under ^^[n]\p and ^n]\p) hence the statement follows in this case also, recalling that 
T <e. U 

The proof of Theorem 17.11 now follows by a simple averaging argument. By the decision-tree 
decomposition of Theorem IE. 11 we can write 

Pr^^V'Jfix) = 1] = E Pr:[LDiT,V') = p] • Pr^^pj^^^ J/p(y) = l] 
peL(r) 

where P' is either D or the uniform distribution By Theorem IE. II and Lemma IE. 31 it follows 
that the probability mass of the bad leaves is at most e under both distributions. Therefore, by 
Lemma IE.3I and Lemma IE. 41 we get 



Pr,^v[fix) = l]-Fr,^u[f{x) 



Pr[LD{T,U) = p] • |Pr,e^„„, [fp{y) = l] - Pr,eP[„„, [fp{y) = l] | < 3e. 

pGGL(T) 

This completes the proof of Theorem 17.11 

E.2 Proof of Theorem IE. II In this section we provide the proof of Theorem IE. 11 For the sake 
of completeness, we give below the relevant machinery from [15]. We note that over the hypercube 
every polynomial can be assumed to be multilinear, and so whenever we discuss a polynomial in 
this section it should be assumed to be multilinear. We start by defining the notion of the critical 
index of a polynomial: 

Definition E.5 (critical index). Let p : { — 1, 1}" — >■ M and r > 0. Assume the variables are 
ordered such that Infj(p) > Infj+i(p) for all i G [n — 1]. The r-critical index of p is the least i such 
that: 

Infi+i(p) 



< r. (E.l 

we say that the r-c: 

index 0, we say that p is r-regular. 



Ei=i+ilnfj(p) 

If Eq. (jKTl) does not hold for any i we say that the r-critical index of p is +oo. If p is has r-critical 
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We will be concerned with polynomials p of degree-d. The work in [15] establishes useful random 
restriction lemmas for low-degree polynomials. Roughly, they are as follows: Let p be a degree-d 
polynomial. If the r-critical index of p is zero, then / = sgn(p) is r-regular and there is nothing to 
prove. 

• If the T-critical index of p is "very large" , then a random restriction of "few" variables causes 
/ = sgn(p) to become a "close-to-constant" function with probability l/2^^'^\ We stress that 
the distance between functions is measured in [15] with respect to the uniform distribution 
on inputs. As previously mentioned, we extend this statement to hold for any distribution 
with sufficiently large independence. 

• If the r-critical index of p is positive but not "very large" , then a random restriction of a 
"small" number of variables - the variables with largest influence in p - causes p to become 
"sufficiently" regular with probability l/2^^'^\ 

Formally, we require the following lemma which is a strengthening of Lemma 10 in |15] : 

Lemma E.6. Let p : { — 1, 1}" — t- M be a degree-d polynomial and assume that its variables are in 
order of non-increasing influence. Let < r', /3 < 1/2 be parameters. Fix a = 0((ilog log(l//3) + 
dlogd) and r" = r' • (C din dln{l/T'))'^ , where C is a universal constant. One of the following 
statements holds true: 

1. The function / = sgn(p) is r'-regular. 

2. With probability at least l/2<^('^) over a random restriction p fixing the first L' = a/r' 
variables of p, the function fp = sgn(pp) is /3-close to a constant function. In particular, 
under any 0((ilog(l//3))-wise independent distribution V' there exists b £ { — 1, 1} such that 
Pr^^v'[fp{x)^b]<T'. 

3. There exists a value k < a/r', such that with probability at least 1/2*-^^'^) over a random 
restriction p fixing the first k variables of p, the polynomial pp is r"-regular. 

By applying the above lemma in a recursive manner we obtain Theorem lE.ll This is done 
exactly as in the proof of Theorem 1 in [15] . We remark that in every recursive application of the 
lemma, the value of the parameter /3 is set to r. This explains why 0((ilog(l/r))-independence 
suffices in the second statement of Theorem IE. 11 Hence, to complete the proof of Theorem lE.l) it 
suffices to establish Lemma lE.61 

Proof (of Lemma If. 6p . We now sketch the proof of the lemma. The first statement of the lemma 
corresponds to the case that the value £ of r'-critical index is 0, the second to the case that it is 
e> L' and the third to 1 < ^ < L'. 

The proof of the second statement proceeds in two steps. Let H denote the first L' most 
influential variables of p and T = [n] \ H. Let p'{xh) = '^scHPi'^)-^^- We first argue that with 
probability at least 2"^^^^^ over a random restriction p to H, the restricted polynomial Pp{xT) will 
have a "large" constant term Pp(0) = p'{p), in particular at least 6 = 2~^^^\ The proof is based 
on the fact that, since the critical index is large, almost all of the Fourier weight of the polynomial 
p lies in p' , and it makes use of a certain anti-concentration property over the hypercube. Since 
the randomness is over H and the projection of P on those variables is still uniform, the argument 
holds unchanged under V. 
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In the second step, by an application of a concentration bound, we show that for at least half of 
these restrictions to H the surviving (non-constant) coefficients oi pp, i.e. the Fourier coefficients of 
the polynomial Pp(xt)—p'(/o), have small £2 norm; in particular, we get that ||pp— Pplb < log(l//3)~'^. 
We call such restrictions good. Since the projection of P on these "head" variables is uniform, the 
concentration bound applies as is. 

Finally, we need to show that, for the good restrictions, the event the "tail" variables xt change 
the value of the function fp, i.e. sgn{pp{xT) +p'{p)) 7^ sgn(p'(p)) has probability at most (3. This 
event has probability at most 

Vv^^[\Pp{xT)-p'{p)\>e]. 

This is done in |15| using a concentration bound on the "tail", assuming full independence. Thus, 
in this case, we need to modify the argument since the projection of T> on the "tail" variables is 
not uniform. However, a careful inspection of the parameters reveals that the concentration bound 
needed above actually holds even under an assumption of 0((i log (l//3))-independence for the "tail" 
XT- In particular, given the upper bound on \\pp — p'p\\2 and the lower bound on 9, it suffices to 
apply Theorem IB. 41 for t = log(l//3)'^/^, which only requires -wise independence. Hence, we 

are done in this case too. 

The proof of the third statement remains essentially unchanged for the following reason: One 
proceeds by considering a random restriction of the variables of p up to the r-critical index - which 
in this case is small. Hence, the distribution induced by T) on this space is still uniform. Since the 
randomness is over these "head" variables, all the arguments remain intact and the claim follows. 
■ 

F Appendix to Section [8] 

We show a generalization of Theorem 18.11 to the intersection of m > 1 halfspaces, which implies 
Theorem 18. II as the special case m = 2. 

Theorem 18.11 (restatement). Let m > 1 be an integer. Let Hi = {x : {ai,x) > 9i} for i E [m], 
with \\ai\\2 = 1 for all i. Let X be a vector of n i.i.d. Gaussians, and Y be a vector of k -wise 
independent Gaussians. Then for k = Q{m^/e'^), 

\Pr[X G nZiH^] - Pr[y G nZiHi]\ < e 

Proof. Let F : — ^ R"* be the map F{x) = {{ai,x) , . . . , {am,x)), and let R be the region 
{x : Vi Xi > 6i}. Similarly as in the proof of Theorem 16.11 we simply show a chain of inequalities 
after setting p = e/m and c = m/p: 

Wr{F{X))] WUFim WUFiY))] E[/R(F(y))]. (F.l) 

Note the maximum influence r does not play a role since under the Gaussian measure we never 
need invoke the Invariance Principle. For the first inequality, observe d2{x,dR) > minj{|xj — 9i\}. 
Then by a union bound, 

m 

Pr[d2{F{X),dR) <uj]< Pr[min{| {ai,X)-9i\} < w] < VPr[| {ai,X)-9i\ < w], 

i ^ — ' 

i=l 

which is 0{mw) by Theorem IB. 2 1 with d= 1. Now, 
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\B[IniFiX))]-B[iUF{Xm < E[|/^XF(X))] - 

< Fr[d2{F{X),dR) <2p] 



O [E {-^) ■ Pr[rf2(F(X), dR) < j (F.2) 



= Pr[d2(F(X), dR) <2p] + oiYl 2"^' • Pr[d2(F(X), OR) < 

\s=l / 

= 0{mp) 
= 0{e) 

where Eq. (|F.2p follows from Theorem 14. 101 

The last inequality in Eq. (]F.ip is argued identically, except that we need to have anticoncen- 
tration of the | (oj,y) | in intervals of size no smaller than p = e/m] this was already shown to 
hold under 0{1/ pP)-w\se independence in Lemma 2.5] for any p-stable distribution, and the 
Gaussian is p-stable for p = 2. 

For the middle inequality we use Taylor's theorem, as was done in Lemma 16.31 If we truncate 
the Taylor polynomial at degree-(/c — 1) for k even, then by our derivative bounds on mixed partials 
of from Theorem 14. 8| the error term is bounded by 

kl 

with the inequality holding by Lemma 15.21 and the m'^' arising as the analogue of the 4^ term that 
arose in Eq. (j6.ip . This is at most e for k a sufficiently large constant times (cm)^, and thus overall 
k = r2(m^ /e^)-wise independence suffices. ■ 

Remark F.l. Several improvements are possible to reduce the dependence on m in Theorem 18. 11 
We presented the simplest proof we are aware of which obtains a polynomial dependence on m, for 
clarity of exposition. See Section [G. 21 for an improvement on the dependence on m to quartic. 

Our approach can also show that bounded independence fools the intersection of any constant 
number m of degree-2 threshold functions. Suppose the degree-2 polynomials are pi, ■ ■ ■ ,Pm- Ex- 
actly as in Section [6] we decompose each pi into pi^i — pi^2 + Pi,3 + Pi,4 + Q- We then define a 
region R C M^™ by {x : \/i £ [m] x\^_.^ — x|j„2 + ^a-i + ^4,1 + Cj + tr(^p. 3) > 0}, and the map 
F -.W ^ M^™ by 

F(x) = (Mp,(X),...,Mp„(X)) 

for the map Mp : W ^ defined in Section El The goal is then to show E[I/j(F(X))] 
E[//j(F(y))], which is done identically as in the proof of Theorem 16.11 We simply state the 
theorem here: 

Theorem F.2. Let m > 1 be an integer. Let Hi = {x : Pi{x) > 0} for i G [m], for some degree-2 
polynomials pi : M" — )■ R. Let X be a vector of n i.i.d. Gaussians, and y be a vector of k-wise 
independent Gaussians with k = i7(poly(m)/e^). Then, 

\Pr[X G nZim - Pr[Y G nZiHi]\ < e 
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Identical conclusions also hold for X,Y being drawn from {—1, 1}" , since we can apply the de- 
cision tree argument from Theorem IE. II to each of the m polynomial threshold functions separately 
so that, by a union bound, with probability at least 1 — mr' each of the m PTF restrictions is either 
r'-close to a constant function, or is r'-regular. Thus for whatever setting of r sufficed for the case 
m = 1 (r = for halfspaces [TJ] and t = for degree-2 threshold functions (Theorem 16. ip ). we 
set t' = r/m then argue identically as before. 

G Various Quantitative Improvements 

In the main body of the paper, at various points we sacrificed proving sharper bounds in exchange 
for clarity of exposition. Here we discuss various quantitative improvements that can be made in 
our arguments. 

G.l Improved FT- mollification In Theorem 14. 8t we showed that for F : M'^ — M bounded 
and c > arbitrary, ||9^-F"^||oo ^ ll-^lloo ■ (2c)l^l for all /3 € N"'. We here describe an improvement 
to this bound. The improvement comes by sharpening our bound on 
We use the following fact, whose proof can be found in |16j . 

Fact G.l. For any multi-index a e N"', 




fo 



if some is odd 



otherwise 



The following lemma is used in our sharpening of the upper bound on 
Lemma G.2. For a multi-index a S N'^, 




Proof. By Fact EH 




a\+d'[ T{\a\ + i) r(|a| + f + i) 




+ 



Eti (n,v.r(«, + i))r(a, + |) 




Write the above expression as 



2Cd 



• [W{a) - X{a) + Y{a) + Z{a)]. 



(G.l) 



a\ + d 
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For a = we have 



^(0) = —r, ^(0) = • — ^ ^, 1^(0) = d(d - 1) ^ ^, Z(0) = d • , , 

r(f)' r(f + i)' ^' ^ ^ 4-r(f + |)' 4-r(f + |; 

Using the fact that T{z + 1) = zT{z), we can rewrite these as 

W{0) 1 X(0) d Y{0) d{d-l) Z{0) 3d 



7r'^/2 r(f)' 7r"'^/2 r(f + i)' 7r-'^/2 2((i + 1) • r(f + i) ' 7r-'^/2 2(d + 1) • r(f + i) 

We thus have VF(0)-X(0)+y(0) + Z(0) = n{W {0) +Y (0) + Z {0)) . Since 2Crf(VF(0)-X(0)+y(0) + 
Z(0))/d = = 1, it thus suffices to show that {W{a) + Y{a) + Z{a))/{W{0) + y(0) + Z{0)) < 

general a. This can be seen just by showing the desired inequahty 
for W{a)/W{0), Y{a)/Y{0), and Z{a)/Z{0) separately. We do the calculation for W{a)/W{0) 
here; the others are similar. 
We have 

2-oW «! . 20(l«l+'^) 

and thus 



Vr(0) - (|a| - (|a| ■ 



Lemma G.3. For any /? G N'^ with = ||9^-B||i < 2^(1^1) • ^^Tp^. 

Proof. The proof is nearly identical to the proof of Lemma 14.51 The difference is in our bound 
of \\x" • b\\2. In the proof of Lemma [4.5^ we just used that \\x°' • b\\2 < ||6||2 = 1- However ,by 
Lemma lG.2| we can obtain the sharper bound 



\K ■ b\\2 < 20(l"l+'i)^a! . (|a| 

We then have 

Wx"" ■ bh ■ \\x^-" ■ bh < 2^(l'^iya! • + • (/3 - a)! • - a\ + d)-(\l^-'^\+d) 



< 2*- 



We now have the following sharpening of item (i) from Theorem 14.81 Over high dimension, for 
some /3 the improvement can be as large as a shrinking of our upper bound in Theorem 14.81 by a 
c^-|/3|/2 factor (for example, when each /3j is \f3\/d). 

Theorem G.4. Let F : M*^ M be bounded and c > be arbitrary, and (3 gN'^ have = ^}{d). 
Then, 

< llFlloo • d'^l • 2«(l'^l) • ^/3!-|/3|-l/3| 

Proof. Note in Eq. (|4.6p in the proof of Theorem 14.81 we showed that ||9'^-F"^||oo < ll-^lloo • d^' • 
The claim then follows by applying Lemma fG. 31 to bound ■ 
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G.2 Improvements to fooling the intersection of halfspaces 

In the proof of Theorem 18.11 in Section [Fl we presented a proof showing that Q.{m^ / e^)- 
independence e- fools the intersection of m halfspaces under the Gaussian measure. In fact, this 
dependence on m can be improved to quartic. One factor of m is shaved by using the improved 
bound from Theorem IG.41 and another factor of m is shaved by a suitable change of basis. The 
argument used to shave the second factor of m is specific to the Gaussian case, and does not carry 
over to the Bernoulli setting. 

Theorem G.5. Let jtt, > 1 be an integer. Let Hi = {x : {ai,x) > 9i} for i G [m], with ||ai||2 = 1 
for all i. Let X be a vector of n independent standard normals, and y be a vector of A;- wise 
independent Gaussians. Then for k = Q,{m^/e'^) and even, 

|Pr[X G nZim - Pr[Y G nZiHi]\ < e 

Proof. Let vi, . . . ,Vm G I^" be an orthonormal basis for a linear space containing the a^. Define 
the region R = {x : \/i £ [m] YIJL^ i^^^'^j) ■^i ^ ™ W^. Note R is itself the intersection of m 
halfspaces in M™, with the ith. halfspace having normal vector 5j G with {bi)j = {ai,Vj). 

We now define the map F : R" — t- M*" by F{x) = ((x, vi) , . . . , (x, Vm))- It thus suffices to show 
that 'E[Iji{F{X))] ~£ 'E[Ifi(F{Y))]. We do this by a chain of inequalities, similarly as in the proof 
of Theorem 18. li Below we set c = m?/e. 

For the first inequality and last inequalities, since we performed an orthonormal change of basis 
the F{X)i remain independent standard normals, and we can reuse the same analysis from the 
proof of Theorem 18.11 without modification. 

For the middle inequality we use Taylor's theorem. If we set R{F{x)) = |/^(F(x)) — Pfc_i(F(x))| 
for Pk-i the degree-{k — 1) Taylor polynomial approximating then 



n 



i=l 



< 



20{k) . pfc WT-l kil'^' 



by Theorem IG. 41 Now note 

En"' \t-\i^^ 1 ^ , fk 



|/3|=fc |/3|=fc 



< 



nO{k) . um ^ 



< 



^ ^'^ V/3 

m=k 

m=k/2 ^ 
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20(fe) 



(G.5) 



where \x\ denotes the vector (|x|i, . . . , |x|m)- Eq. ()G.4p holds for the following reason. Let /3 G 
be arbitrary. Since k is even, the number of odd /3j must be even. Let M be any perfect matching 
of the indices i with odd /3j. Then for € M, either or must be 

at least as large as |xi|'^'|xj|'^J . Let /3' be the new multi- index with only even indices obtained by 
making all such replacements for S M. We then replace ■ (^) • \x\^ in the summation with 
V^- Q,)-\xf. In doing so, we have x^' > x^, but ■ Q,) may have decreased from y/j3\ • (^) , 

but by at most a 2^^^^K^ factor since each /3j decreased by at most 1 and is at most k. Also, in 
making all such replacements over all /3 G N"^, we must now count each (3 with even coordinates 
at most 3™ times, since no such (3 can be mapped to by more than 3™ other multi-indices (if we 
replaced some multi-index with /3, that multi-index must have its ith coordinate either one larger, 
one smaller, or exactly equal to /3j for each i). Note subsequent inequalities dropped the A:™ term 
in the numerator since 

20(fc) . ^ 20(^) for our choice of k. 

Now by Eq. 



E 



< 



20(fc) 



E 



k/2 



Since the F{X)i are independent standard normal random variables, Yl^i -^(^)i follows a chi- 
squared distribution with m degrees of freedom, and its A;/2th moment is determined by k-wise 
independence, and thus 



E 



k/2 



2^/2 T{k/2 + m/2) ^ ^q^j^-^ ^^/^ ^ ^Qfj^-^ ^^/^ 
T{m/2) 



(G.6) 



This finishes our proof, since by Eq. ()G.3P the expected value of our Taylor error is 



20{k) . gfc 



E 



E 

\P\=k 



20(k) . pfc 
~~kkjr~ 



'20{k) 
JkJY 



. 20{k) . ^fc/2 



20(fc) . ^k 



which is 0(e) for A; = S7(c ) = il(?n /e ). 
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